feat: Empty Block Complementing#343
Conversation
Greptile SummaryThis PR implements Empty Block Complementing for Slurm block topology: when accelerator domains have more hosts than
Confidence Score: 5/5Safe to merge. The complement logic is well-guarded, partition-scoped, and deterministic; all previously flagged issues have been addressed. All three substantive issues from prior review rounds (stale blockID, nil dereference in initBlocks, cross-partition node contamination) are explicitly fixed and regression-tested. The new packing/complement code is thoroughly unit-tested across edge cases. The only remaining nit is a misleading error message wording in validateBlockSizes. pkg/engines/slurm/slurm.go — the validateBlockSizes error messages are slightly inaccurate in their wording but do not affect correctness. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[toBlockTopology / getBlockTopologyUnit] --> B[complementBlocks]
B --> C{fanoutsPerLevel OK\n& domains != nil?}
C -- No --> Z[return original blocks]
C -- Yes --> D[orderedDomainsForBlocks\nfilter to partition-local nodes]
D --> E{all blocks have\ndomain entries?}
E -- No --> Z
E -- Yes --> F[groupSizeFromOrderedDomains\nsmallest 2^n x base ge maxAccelSize]
F --> G[packOrderedDomainsIntoBlocks\nsplit + pad each domain group]
G --> H{len out != len blocks?}
H -- Yes --> I[expandedBaseBlockSlots\npad to next tree capacity]
H -- No --> J[assignSequentialBlockIDs]
I --> J
J --> K{shouldUseComplementedBlocks?}
K -- hasEmptySlots or len>input --> L[return complemented blocks]
K -- No --> Z
L --> M[refresh nodeInfo.blockID\nfor GetNodeTopologySpec]
Reviews (17): Last reviewed commit: "replace the temporary block tree used fo..." | Re-trigger Greptile |
91ad1ef to
c122256
Compare
f048cb8 to
eb4d16a
Compare
dba4934 to
7f30438
Compare
Signed-off-by: Ravi Shankar <ravish@nvidia.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #343 +/- ##
==========================================
+ Coverage 68.46% 71.08% +2.62%
==========================================
Files 82 86 +4
Lines 4842 5219 +377
==========================================
+ Hits 3315 3710 +395
+ Misses 1395 1318 -77
- Partials 132 191 +59 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
0cdb20b to
8589f77
Compare
…lat slot packing Signed-off-by: Dmitry Shmulevich <dshmulevich@nvidia.com>
Description
Empty Block Complementing for Slurm Block Topology.
Checklist
git commit -s).